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Abstract 

In-place associative integer sorting technique was proposed for integer lists which 



C/3 " requires only constant amount of additional memory replacing bucket sort, distribu- 

. tion counting sort and address calculation sort family of algorithms. The technique 

, ^ ' was explained by the analogy with the three main stages in the formation and re- 

. trieval of memory in cognitive neuroscience which are (i) practicing, (ii) storing and 

^ \ (hi) retrieval. 

■ In this study, the technique is specialized with two variants one for read-only 

integer keys and the other for modifiable integers. Hence, a novel algorithm is 
obtained that does not require additional memory other than a constant amount 



and sorts faster than all no matter how large is the list provided that m = 0{n log n) 
where m is the range and n is the number of keys (or integers). 

1 Introduction 

The main difficulties of all distributive sorting algorithms is that, when the keys are 
distributed using a hash function according to their content, several keys may be clustered 
around a loci, and several may be mapped to the same location. These problems are 
solved by inherent three basic steps of associative sort [T] (i) practicing, (ii) storing and 
(hi) retrieval which are the three main stages in the formation and retrieval of memory in 
cognitive neuroscience. The technique assumes that associations are between the integers 
in the list space and the nodes in an imaginary linear subspace that spans a predefined 
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interval of range of integers. The imaginary subspace can be denned anywhere on the list 
space S[0 . . .n — 1] provided that its boundaries do not cross over that of the list. The 
range of the interval that the imaginary subspace spans is upper bounded with the number 
of integers n but may be smaller and can be located anywhere making the technique in- 
place, i.e., beside the input list, only a constant amount of memory locations are used for 
storing counters and indices. Furthermore, this definition reveals the asymptotic power 
of the technique with increasing n with respect to the range of integers, as well. 

An association between a integer and the imaginary subspace is created by a node using 
a monotone bijective hash function that maps the integers in the predefined interval to 
the imaginary subspace. The process of creating a node by mapping a distinct integer to 
the imaginary subspace is "practicing a distinct integer of an interval". Since imaginary 
subspace is defined on the list space, this is just swapping. Once a node is created, 
the redundancy due to the association between the integer and the position of the node 
releases the word allocated for the integer in the physical memory except one bit which 
tags the word as a node for interrogation. All the bits of the node except the tag bit 
can be cleared and used to encode any information. Hence, they are the "record" of 
the node and the information encoded into a record is the "cue" by which cognitive 
neuro-scientists try to describe how the brain recalls the next item in the order during 
retrieval. For instance, it will be foreknown from the tag bit that a node has already 
been created while another occurrence of that particular integer is being practiced giving 
the opportunity to count other occurrences. The process of counting other occurrences 
of a particular integer is "practicing all the integers of an interval", i.e., rehearsing used 
by cognitive neuro-scientists to describe how the brain manipulates the sequence before 
storing in short (or long) term memory. On the other hand, the tag bit discriminates the 
word as node and the position of the node lets the integer be retrieved back from the 
imaginary subspace using the inverse hash function. 

Practicing does not need to alter the value of other occurrences, i.e., only the first 
occurrence is altered while being practiced from where a node is created. All other 
occurrences of that particular integer remain in the list space but become meaningless. 
Hence they are "idle integers" . On the other hand, practicing does not need to alter the 
position of idle integers as well, unless another distinct integer creates a node exactly 
at the position of an idle integer while being practiced. In such a case, the idle integer 



2 



is moved to the former position of the integer that creates the new node. This makes 
associative sort unstable, i.e., equal integers may not retain their original relative order. 

Once all the integers in the predefined interval are practiced, the nodes that are dis- 
persed in the imaginary subspace with relative order are clustered in a systematic way, 
i.e., the distance between the nodes are closed to a direction retaining their relative order. 
This is the storing phase of associative sort where the received, processed and combined 
information to construct the sorted permutation of the practiced interval is stored in the 
short-term memory. When the nodes are moved towards a direction, it is not possible 
to retain the association between the imaginary subspace and list space. However, the 
record of a node can be further used to encode the absolute position of that node as well, 
or maybe the relative position or how much that node is moved relative to its absolute or 
relative position during storing. Unfortunately, this requires that a record is enough to 
store both the position of the node and the number of idle integers practiced by that node. 
However, as explained earlier, further associations can be created using the idle integers 
that were already practiced by manipulating either their position or value or both. Hence, 
if the record is enough, it can store both the positional information and the number of 
idle integers. If not, an idle integer can be associated accompanying the node to supply 
additional space for it for the positional information. 

Finally, the sorted permutation of the practiced interval is constructed in the list 
space, using the stored information in the short-term memory. This is the retrieval phase 
of associative sort that depends on the information encoded into the record of a node. 
If the record is enough, it stores both the position of the node and the number of idle 
integers. If not, an associated idle integer accompanying the node stores the position of 
the node while the record holds the number of idle integers. The positional information 
cues the recall of the integer using the inverse hash function. This is "integer retrieval" 
from imaginary subpace. Hence, the retrieved integer can be copied on the list space as 
much as it occurrs. 

Hence, moving through nodes that represent the start and end of practiced integers 
as well as retaining their relative associations with each other even when their positions 
are altered by cuing allow the order of integers to be constructed in linear time in-place. 

From complexity point of view, associative sort shows similar characteristics with 
bucket sort 013] and distribution counting sort [UEj. It sorts n integers S[0 . . . n — 1] each 
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in the range [0, m — 1] using 0(1) extra space in 0(n + m) time for the worst, (D(m) time 
for the average (uniformly distributed integers) and 0(n) time for the best case. The 
ratio — defines the efficiency (time-space trade-offs) of the algorithm letting very larges 
lists to be sorted in-place. 

1.1 Specialized Version for Distinct Integers 

If it is known that all the integers of the list are distinct, associative sorting technique 
can be specialized because there is only one integer that can be practiced and mapped to 
a location creating a node. This means that 2 solutions are possible for lists of distinct 
keys. The first one is for read-only keys and instead of tagging the word as node using its 
most significant bit (MSB), the key itself can be used to tag the word "implicitly" as node 
without modifying it, since when a key is mapped to the imaginary subspace, it will always 
satisfy the monotone bijective hash function. The keys are "implicitly practiced" in this 
case. Hence, storing phase is enough to obtain the sorted permutation of the practiced 
interval cancelling the retrieval phase. In each iteration only the keys that fall into the 
range [8, 8 + n — 1] can be sorted where 8 is the minimum of the list of that iteration. It 
should be noted that, this variant is suitable for sorting a list S of n elements, S[0 . . .n — 1] 
each have an integer key where the problem is to sort the elements of the list according 
to their integer keys. 

The other scenario is that, when a distinct integer is mapped to the imaginary sub- 
space, its record can be used to improve the interval of range of integers that are practiced. 
During storing, each node is clustered at the beginning of the list together with its record 
retaining its relative order with respect to others. At this point, we need logn bits of 
the record to encode the node's absolute position to cue the retrieval of the integer from 
the imaginary subspace. But the tag bit can be released during storing phase since we 
only need how many nodes are stored at the beginning of the list in total. Hence we can 
use for instance the least significant w — logn bits of a record during practicing for any 
other purpose. It is immediate from this definition that a monotone bijective super hash 
function can be used during practicing. It should be noted that, this variant is suitable 
for sorting a list S of n integers, S[0 . . . n — 1] where the problem is to sort the integers in 
ascending or descending order. 

With this introductory information, the contributions of this study are, 
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A practical algorithm that sorts a list of n elements S[0 . . . n — 1] each have a read-only 
distinct integer key in the range [0,m — 1] using 0(1) extra space in 0(n + m) time for 
the worst, 0(m) time for the average (uniformly distributed keys) and 0(n) time for the 
best case. Therefore, the ratio — defines the efficiency (time-space trade-offs) letting very 
large lists to be sorted in-place. 

Practical comparisons with Q(n\ogn) quick sort [6j[7] and merge sort [8] and heap sort 
P|[TUj which take 0(nlogn) time on all inputs showed that associative sort is superior 
than all (up to 20 times) provided that — < clogra where c ~ 4 for both heap sort and 
merge sort. Quick sort gave worser results (c ~ 8) for distinct keys. These results are 
consistent with m calculated theoretically making average case time complexity of the 
algorithm less than lower-bound of comparison-based sorting algorithms, i.e., 0(m) < 
fi(nlogn). Another very important meaning of this inequality is that, since it does not 
require additional memory space other than a constant amount, no matter how large is 
the list, the proposed algorithm will sort faster than all provided that m = 0(nlogn). 

Practical comparisons with instable distribution counting sort (which shows better per- 
formance than stable one) showed that associative sort is superior in every case. This 
is expectable considering memory allocation overload of distribution counting sort since 
time-complexities of both algorithms are same. The performance of the algorithm is even 
asymptotically better than 2 lines of code referred in textbooks for sorting n distinct in- 
tegers from [0 ... n — 1] with indexing an auxiliary list B of the same size as the input A 
with keys of A by .BL4[z]] = A[i] for i — 0, 1, . . . , n — 1, and then reconstructing sorted per- 
mutation of A back by A[i] = B[i] for i — 0, 1, . . . , n — 1, which is due to time-consuming 
memory allocation of the auxiliary list. 

Associative sort for read-only distinct keys has been compared with radix sort [2l[3ll6|fTTj 
and bucket sort, as well. The results showed that it is superior than radix sort when 
— < 8 and 2 times faster than bucket sort for n distinct integer keys S[0...n — 1] each in 
the range [0, n — 1]. 

Finally, the dependency of the efficiency of associative sort on the distribution of the keys 
is only 0(n) which means it replaces all the methods based on address calculation [T2T 
[T7] . that are known to be very efficient when the keys have known (usually uniform) 
distribution and require additional space more or less proportional to n [TT] . 
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A practical algorithm that sorts n modifiable distinct integers S[0 . . .n — 1] each in the 
range [0, m — 1] using 0(1) extra space with an efficiency improvement of n ( w ™ og n ) ■ 111 
other words, if — < w — logn the complexity of the algorithm is strictly 0(n). Otherwise, 
it sorts the integers using 0(1) extra space in 0(n + w ™ ogn ) time for the worst, 0( w _^ ) 
time for the average (uniformly distributed integers) and 0(n) time for the best case. 
Similarly, the efficiency can be represented with — < c(w — log n) where the constant 
c > 1 is determined by the other sorting algorithms. When modifiable version is compared 
with read-only version, it has been observed that up to ^ < 10 read-only version is more 
efficient possibly due to bitwise operations involved in the modifiable version. Afterwards 
modifiable version becomes more efficient than read-only version augmenting it. 

Finally, the technique requires at least 0, at most In — k swaps, where k > is the 
number of iteration (or dept of recursion) to complete the sorting which sets the lower 
bound for number of data movements to complete a sorting. 

Even omitting its space efficiency for a moment, associative sort asymptotically out- 
performs all content based sorting algorithms when n is large relative to m. 

2 Definitions 

The definition of integer key sorting is: given a list Sofn elements, S[0 . . . n — 1] each have 
an integer key, the problem is to sort the elements of the list according to their integer 
keys. To prevent repeating statements like "integer of the element S[i\" , S[i] is used to 
refer the integer. 

The definition of integer sorting is: given a list S of n integers, S[0 . . .n — 1], the 
problem is to sort the integers in ascending or descending order. 
The notations used throughout the study are: 

(i) Universe of integers is assumed U = [0 ... 2 W — 1] where w is the fixed word length. 

(ii) Maximum and minimum integers of a list are, max(S') = max(a|a G S) and 
min(S') = min(a|a G S), respectively. Hence, range of the integers is, m = 
max(S') — min(iS') + 1. 

(iii) The notation B C A is used to indicated that B is a proper subset of A. 

(iv) For two lists S\ and S 2 , max(S'i) < min(S , 2 ) implies Si < S 2 - 

Universe of Integers. When an integer is first practiced, a node is created releasing w 
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bits of the integer free. One bit is used to tag the word as a node. Hence, it is reasonable 
to doubt that the tag bit limits the universe of integers because all the integers should be 
untagged and in the range [0, 2 W ~ 1 — 1] before being practiced. But, we can, 

(i) partition S into 2 disjoint sublists S\ < 2 w ~ l < S 2 in 0(n) time with well known 
in-place partitioning algorithms as well as stably with [18], 

(ii) shift all the integers of S 2 by — 2 W ~ 1 , sort Si and S 2 associatively and shift S 2 by 
2w-i 

There are other methods to overcome this problem. For instance, 

(i) sort the sublist S[0 . . . (n/logn) — 1] using the optimal in-place merge sort [19] . 

(ii) compress S[0 . . . (n/ logn) — 1] by Lemma 1 of [20] generating Q(n) free bits, 
(hi) sort S[(n/ logra) . . .n — 1] associatively using Q(n) free bits as tag bits, 

(iv) uncompress S[0 . . . (n/ log n) — 1] and merge the two sorted sublists in-place in linear 
time by [IS]. 

Number of Integers. If practicing a distinct integer lets us to use w — 1 bits to practice 
other occurrences of that integer, we have w — 1 free bits by which we can count up to 
2 w ~ l occurrences including the first integer that created the node. Hence, it is reasonable 
to doubt again that there is another restriction on the size of the lists, i.e., n < 2 w ~ l . But 
a list can be divided into two parts in 0(1) time and those parts can be merged in-place 
in linear time by [19] after sorted associatively. 

It should be noted that these restrictions are only valid for the variant proposed for 
modifiable integers. Hence, for the sake of simplicity, it will be assumed that n < 2 w ~ l 
and all the integers are in the range [0, 2 W ~ 1 — 1] throughout the study. 

3 Basics of Associative Sort 

Given n distinct integers S[0...n — 1] each in the range [u, v], if m — n, the sorted 
permutation of the list can be represented with two parameters (2 log U bits) one of 
which is the initial address of the sequential memory separated for the list (accessed by 
S[0}) in the RAM and the other is the § = u. The zth integer of the sorted list can 
be calculated by S[i] = i + 5 and the actual value at ith location is meaningless for 
this calculation. Hence, if S is the sorted permutation, then there is a bijective relation 
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between each integer and its position, i.e., % = S[i] —5. From contradiction, if S is not the 
sorted permutation, i ^ S[i] — S implies that the integer S[i] is not at its exact location. 
Its exact location can be calculated with j = S[i] — 5. Therefore, this monotone injective 
hash function that maps the integers to j e [0, n — 1] can sort the list in 0(n) time using 
(9(1) constant space. This is cycle leader permutation where S is re-arranged by following 
the cycles of a permutation it. First S[0] is sent to its final position 7r(0) (calculated by 
j = S[i] — 5). Then the element that was in 7r(0) is sent to its final position it(tt(0)). The 
process proceeds in this way until the cycle is closed, that is until the integer to position 
is found which means the association = S[0] — 5 is constructed between the first integer 
and its position. Then the iterator is increased to continue with the integer of S[l}. At 
the end, when all the cycles of S[i] for i — 0, l..,n — 1 are processed, all the integers are 
moved to their exact position and the association i = S[i] — 5 is constructed between the 
integers and their positions, i.e., the sorted permutation of the list is obtained. 

If we look at this approach closer, we can interpret the technique entirely different. 
That is, we are indeed creating an imaginary subspace Jm[0 ... n — 1] over S[0 . . . n — 1] 
where the relative basis of this imaginary subspace coincides with that of the list space 
in the physical memory. The imaginary subspace spans a predefined interval of the range 
of integers depending on n. Since m — n, it spans the entire range of integers. The 
association between the list space and the imaginary subspace is created by a node using 
the monotone bijective hash function i = S[i] — 5 that maps a particular integer to the 
imaginary subspace. When a node is created for a particular integer, the redundancy 
due to the association between the integer and the position of the node releases the word 
allocated for the integer in the physical memory. Hence, we can clear the node (S[i] = 0) 
and set its tag bit, for instance its most significant bit (MSB) to discriminate it as a 
node, and use the remaining w — 1 bits of the node for any other purpose. When we want 
the integer back to list space from imaginary subspace, we can use the inverse of hash 
function and get the integer back by S[i] = i + 5 to the list space. However, we don't use 
free bits of a node for other purposes in this case because it is known that all the integers 
are distinct and hence only one integer will be practiced at a location creating a node. 
Therefore, instead of tagging the word as node using its MSB, we use the integer itself to 
tag the word "implicitly" as node, since if a integer is mapped to the imaginary subspace, 
then it will always satisfy the monotone bijective hash function i — S[i] — 5. Hence, the 
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integers are "implicitly practiced" in this case. 

3.1 Sorting n Distinct Read-Only Integer Keys 

The above definition immediately lets us to state that, 

Lemma 3.1. Given a list S of n elements S[0...n — 1] each have a distinct integer key 
in the range [u,v], elements of the list that have keys in the range [8, 8 + n — 1] with 
5 = min(iS') can be sorted associatively at the beginning of the list in 0(n) time with at 
most 2nd — I at least swaps, respectively using only Oil) constant space. 

Proof. Given n distinct integer keys S[0...n — 1] each in the range [u, v], it is not possible to 
construct a monotone bijective hash function (minimal monotone perfect hash function) 
that maps all the keys of the list into j G [0,n — 1] without additional storage space less 
than Q(n + log if) bits [2T]. However, a bijective hash function can be constructed as 
a partial function [22] that assigns each key of S\ C S in the range [8, 8 + n — 1] with 
5 = min(S') to exactly one element in j G [0, n — 1]. The partial hash function of this 
form is, 

j = S[i]-6 if S[i]-S<n (3.1) 

With this definition, the proof has two basic steps of associative sort: 

Algorithm A. Implicitly practice all the distinct keys in the range [5, 5 + n — 1] by 
mapping them into the imaginary subspace lm[0...n — 1] over S[0...n — 1] using Eqn. 13.11 
Assuming that the minimum of the list 5 = min(S') is known, this is, 

Al. set % = 0; 

A2. if S[i] — 8 > n, then S[i] is a key of S2 that is out of the practiced interval. Hence, 
increase n' d that counts the number of keys in S 2 , update 8' = min(8', S[i}), increase 
i and repeat this step; 

A3, if i = S[i] — 8, then S[i] is a node. Hence, increase i and goto step IA21 

A4. otherwise, S[i] is a key of S\ to be practiced. Hence, swap S[i] with S[j] where 
j = S[i] — 8. Increase % if j < i. Goto step IA2j 
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Algorithm B. Implicitly store all the practiced keys which satisfy the monotone bijec- 
tive hash function % = S[i] —5. If a key satisfies the hash function, then it is indeed a 
node of the imaginary subspace. From algorithm point of view, this is partitioning the 
list into practiced and unpracticed keys, i.e., 

Bl. set i = 0, j = and k = n d ; 

B2. if i 7^ S[i] — 5, then S[i] is a key of S2 that is out of the practiced interval. Increase 
i and repeat this step; 

B3. otherwise, S[i] is a node. Hence, swap S[i] with S[j]. Increase % and j and decrease 
k. if k = exit, otherwise goto step T 



As it is foreknown that all the keys of the list are distinct, assuming all rid keys in the 
practiced interval are located at wrong positions, n<j keys are swapped while associations 
are created during implicitly practicing phase. Afterwards, those nd keys are stored 
(clustered) at the beginning of the list retaining their relative order, with at most nd — 1 
swaps resulting in the sorted permutation of the practiced interval. On the other hand, 
if the keys in the range [8, 8 + n — 1] are consecutive, 

a, a + 1, a + 2, . . . 

and already located at the beginning of the list in order, no any integer is swapped during 
practicing and storing phases of associative sort. 

□ 



Lemma 13.11 proves that all the keys in a given interval spanned by the imaginary 
subspace can be sorted associatively in 0(n) time using 0(1) constant space, provided 
that the range of this interval is upper bounded by n. But the solution can be applied 
either sequentially or recursively to the entire list until all the keys are sorted. The 
sequential version is, 

i. find min(jS') and max(S'); 

ii. initialize 8 = min(S'), 5' = max(S'), n' d = 0; 



iii. implicitly practice all the distinct keys using Algorithm A 



iv. implicitly store all the distinct keys using Algorithm B 

10 



v. If rid = n> exit. Otherwise, set S = S[rid — 1 . . . n — 1], n = n—rid, 5 = 5 1 , 5' = max(S'), 
reset counters n d and n' d and goto step 



in, 



Remark 3.1. It should be noted that min(S') and max(S') need not be found in step 
Instead, if 5 = and 8' = max(U) the algorithm sorts the keys in the range [0, n — 1] 
during the first iteration (or recursion). However, if there is not any key in this interval, 



Algorithm A| finds 8' = min(S) in step (iii) in 0(n) time, and the sorting continues with 
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the keys in [8', 8' + n — 1] in the next iteration (or recursion). 

Remark 3.2. Associative sort technique is on-line in the sense that after each step 
rid keys of S\ are sorted at the beginning of the list and ready to be used. 

Remark 3.3. If there is satellite information along with the list S, they should be swapped 
together with the keys at step IA4I and | 



Best Case Complexity. Given n integer keys S[0 . . . n—1], if n— 1 keys satisfy S^z] — 8 < 
n, then these keys are sorted in 0(n) time. In the next step, there is one key left which 
implies sorting is finished. As a result, time complexity of the algorithm is lower bounded 
by Q(n) in the best case. 

Worst Case Complexity. Given n integer keys S[0 . . .n — 1] and m = (3n, if there is 
only 1 key available in the practiced interval at each iteration (or recursion) until the last, 
in any jth step, the only key s that will be sorted satisfies s < jn — (j — 1) which implies 
that the last alone key satisfies s < jn — (j — 1) < (3n from where we can calculate j by 
j < ^n-i ■ I n ^ n * s case 5 t ne time complexity of the algorithm is, 

0(n) +0(n-l) + ... + 0(n- j) = (j + l)O(n) - O(f) < (/? + \)0{n) (3.2) 

Therefore, the algorithm is upper bonded by ((3 + l)0{n) = 0{m + n) in worst case. 

Average Case Complexity. Given n integer keys S[0 . . . n — 1], if m — fin and the 
keys are uniformly distributed, this means that ^ keys satisfy S[i] < n. Therefore, the 
algorithm is capable of sorting ^ keys in 0(n) time during first pass. This will continue 
until all the keys are sorted. The sum of sorted keys in each iteration can be represented 
with the series, 

n n(B-l) n(B-l) k - 1 , , 
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It is reasonable to think that the sorting ends when one term is left which means the sum 
of k terms of this series is equal to n — 1, from where we can calculate the number of 
iteration or dept of recursion k which is valid when (3 > 1 by, 

- = J 3.4 

n p k 

It is seen from Eqn. 13.41 that when m = 2n, i.e., — 2, number of iteration or dept of 
recursion becomes k = logn and the complexity is the recursion T{n) = T(|) + 0(n) 
yielding T(n) = 0(n). It is known that each step takes 0(n) time. Therefore, the time 
complexity of the algorithm is, 

{ n )+ 0(^) + ... + 0(^^) (3.5) 
from where we can obtain by defining x = , 

0(n) (1 + x + x 2 + x 3 + • • • + x^ 1 ) = 0(n)(— ^—-) < 0O(n) (3.6) 

1 — x 1 — x 

which means that the algorithm is upper bounded by 0O(n) or 0(m) in the average case. 

More on Complexity The time complexity of the overall algorithm is upper bounded 
by Oin + m) and lower bounded by Q(n). Upper bound 0(n + m) does not mean that 
the complexity is unbounded when m > n c with c > 2. In each iteration, min^) is 
found and the new iteration starts with this minimum. Hence, the number of iteration or 
recursion is bounded by the number of keys, i.e., k < n. This means that the complexity 
is upper bounded by 0(n 2 ) regardless of how much big is the ratio ^. 

Number of total swaps is at most 2n—k because in each implicitly practicing and implicitly 
storing phases at most 2nd — 1 keys of Si are swapped which sum to 2n — k overall where 
k > 1 is the number of iteration (or dept of recursion) to complete the sorting. If k = n, 
then the the list is sorted with at most n — 1 swaps in 0(n 2 ) time. 

If the list is already sorted and the keys of Si are always consecutive or only one key is 
available in the practiced interval all the time, then no any swap occurs. 



4 Sorting n Distinct Modifiable Integers 

In this section, the associative sorting technique for distinct modifiable integers will be 
introduced with its three basic steps: (i) practicing, (ii) storing and (iii) retrieval. This 
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is an integer sorting problem not an integer key sorting problem. Hence, the definition 
degenerates to: given a list Sofn integers, S[0 . . . n — 1], the problem is to sort the integers 
in ascending or descending order. 

Once a node is created for a particular integer when it is practiced, the redundancy 
due to the association between the integer and the node releases the word allocated for 
the integer in the physical memory except one bit which is used to tag the word as node 
of the imaginary subspace for interrogation. The released w — 1 bits of a node become its 
record. Hence, we can improve the associative sort for distinct integers using the record of 
each node for other purposes. After practicing, each node is clustered at the beginning of 
the list together with its record retaining its relative order with respect to others during 
storing. At this point, we need logn bits of the record to encode the node's absolute 
position as the cue. But the tag bit is released during storing since we only need to 
know how many nodes are stored at the beginning of the list in total. Hence we can use 
w — log n bits of a record during practicing for any other purpose. It is immediate from 
this definition that, 

Lemma 4.1. Given n distinct integers S[0...n — 1] each in the range [u,v], all the n^ 
integers in the range [5, 5 + (w — logn)n — 1] with 5 = min(S') can be sorted associatively 
at the beginning of the list in 0(n) time using only 0(1) constant space. 

Given n distinct integers S[0...n — 1] each in the range [u,v], a monotone bijective 
super hash function can be constructed as a partial function that assigns each integer of 
Si C S in the range [5, 5 + (w — logn)n — 1] with 5 = min(S') to exactly one element in 
j G [0, n — 1] and one element in k G [0, (w — logn) — 1]. The simplest monotone bijective 
partial super hash function of this form is, 

j = (S[i] — 5) div (w — logn) if S[i] — 5 < (w — log n)n (4.1) 

k = (S[i] — 5) mod (w — logn) if S[i] — 5 < (w — \ogn)n (4.2) 

In this case, w — logn integers may collide and mapped to the same node created at 
j G [0, n — 1] (Eqn. 14. ip in the imaginary subspace. But we can use w — logn free bits of a 
record of the node to encode which of w — log n distinct integers are mapped to the same 
node by setting the corresponding bit determined by k (Eqn. I4.2p . In other words, now 
the imaginary subspace is two dimensional over the list space where the first dimension 
along the list designates the node position and the second dimension along the bits of 
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the node uniquely determines the integers which are mapped to the imaginary subspace 
through that node. 

Proof. With this definition, the proof has three basic steps of associative sort: 

Algorithm C. Practice all the distinct integers of the interval [5, 5+ (w — log n)n — 1] by 
mapping them to the node determined by Eqn. 14. 1 l in the imaginary subspace lm[0...n — 1] 
over S[0...n — 1]. Once a integer is mapped to a node, set the integer's unique bit in the 
record determined by Eqn. 14. 21 which discriminates it from the others mapped to the same 
node. It is assumed that minimum of the list 5 = min S is known. 

CI. set i = 0; 

C2. if S[i] < 5, then S[i] is an idle integer of an interval that has already been sorted in 
the previous iterations (or recursions). Hence, increase % and repeat this step; 

C3. if MSB of S[i] is 1, then S[i] is a node. Hence, increase i and goto step 1021 

C4. if S[i] —5 > (w— log n)n then S[i] is a integer of S2 that is out of the practiced interval. 
Increase n' d that counts the number of integers of 5 2 , update 5' = min(6' , S[i]) , 
increase i and goto to step IC2( 

C5. calculate j and k using Eqn. 14.11 and HT2"} respectively; 

C6. if MSB of S[j] is 0, then S[i] is the first occurrence which will create the node at 
S[j}. Hence, move S[j] to S[i], clear S[j] and set MSB and kth bit of S[j] to 1. If 
j < % increase i. Increase rid that counts the number of distinct integers and hence 
the nodes, and goto step IC2I 

C7. otherwise, a node has already been created at S[j] by another occurrence of S[i}. 
Hence, set kth bit of S[j] (without touching others) and increase i and n c that 
counts number of total idle integers over all distinct integers, and goto step IC2j 

Algorithm D. Store all the integers of the practiced interval in the short term memory 
by clustering the nodes at the beginning of the list while retaining their relative order 
with respect to each other. During clustering encode absolute position of each node into 
its record's most significant logn bits. 
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Dl. set i = 0, j = and k = rid, 

D2. if MSB of S[i] is 0, then S[i] is not a node, hence increase % and repeat this step; 

D3. if MSB of S[i] is 1, then S[i] is a node. Hence, clear MSB of S[i], encode i into most 
significant logn bits of S[i] and swap S[i] with S[j}. Increase % and j and decrease 
k. If k = exit, otherwise goto stepl 



Algorithm E. Retrieve rid + n c integers of S\ from records of short term memory 
S[0 . . . rid — 1] to construct sorted permutation of rid+n c integers of Si. Process the records 
from right to left backwards and expand the integers over S[0 . . . rid + n c — 1] sequentially 
right to left backwards. 

El. initialize i = rid — 1 and p = rid + n c — 1; 

E2. initialize k = w — logn — 1 and decode absolute position j of the node from most 
significant logra bits of its record at S[i]; 

E3. while k > 0; 

(i) if fcth bit of S[i] is 1, then S[p] — j(w — logn) + k + 5. Decrease k and p and 
repeat this step. 

(ii) otherwise, only decrease k and goto step (i). 



E4. decrease i and goto step | 

□ 

Sequential Version Instead of using stack space, a sequential version can be developed. 
After storing, n c idle integers of Si and n' d integers of 5*2 are distributed disorderly together 
at S[n d ... n — 1]. If we partition S[rid ... n — 1] selecting the pivot equal to 5, then idle 



integers are clustered after rid records of the short term memory. Therefore, | Algorithm E 
can immediately be used to retrieve. In such a case, the structure of the overall algorithm 
becomes: 

i. find min(S') and max(S'); 

ii. initialize 5 = min(S'), 5' = max(S'), rid — 0, n c = 0, n' d = 0; 



iii. practice all the integers using Algorithm C 
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iv. store all the practiced integers using Algorithm D 

v. partition S[n d . . .n — 1] clustering n c idle integers of Si to the beginning using any 
in-place partitioning algorithm; 

vi. retrieve the sorted permutation of the practiced interval using Algorithm E 



vii. If n' d — exit. Otherwise set S = S[n d + n c . . . n — 1]. Update n = n — n d , 5 = 5', 
5' = max (5"), reset counters n d , n c and n' d and goto step 

Remark 4.1. min(S') and max(S') need not be found in step 



m 



Instead, if 5 = and 



5' = max(U) the algorithm sorts the integers in the range [5, 5 + (w — logn)n — 1] during 
the first iteration (or recursion). However, if there is not any integer in this interval, 



Algorithm C finds 5' = min(S') in step IC4I in 0(n) time, and continues with the integers 
in [5', 5' + (w — \ogn)n — 1]. 

Remark 4.2. Sequential version of associative sort technique is on-line in the sense that 
after each step 



vi 



n d + n c integers are added to the sorted permutation at the beginning 



of the list and ready to be used. 

Recursive Version Saving constants n, n d , n c and 5 in stack space, we can recursively 



call Algorithm C and Algorithm D with three parameters n = n c + n' d , S = S[n d . . . n — 1] 
and min(S') = 5'. The last parameter prevents searching the minimum of the list in 
each level of recursion. Although the exact number of integers of S2 is n' d , the number 
of integers of S in the new recursion is n c + n' d where n c of them are idle integers of 
Si and meaningless while sorting 5*2. However, this will increase the interval of range 
of integers spanned by the imaginary subspace improving the overall time complexity in 
each level of recursion. The recursion can continue until no any integer exists in 5*2- In 
the last recursion, retrieval phase can begin to construct the sorted permutation of n d + n c 
integers of Si from n d records stored at the short term memory S^O . . . n d — 1] and expand 
over S[0...n — 1] sequentially right to left backwards. Each level of recursion should 
return the total number of integers expanded on the list to the higher level to let it know 
where it will start to expand its interval. 



Worst Case Complexity Algorithm C] |Algorithm D and Algorithm E are together ca- 
pable of sorting integers that satisfy S[i] — 5 < (w — logn)n in 0(n) time. If we as- 
sume m = (3n with /3 > w — log n, and there is only one integer available that satisfies 
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S[i] — 5 < (w — logn)n in each iteration or recursion until the last, in any jth step, the 
only integer s of Si that will be sorted satisfies, 



s — 5 < j(w — logn)n — (j — 1) 



(4.3) 



Eqn. 14.31 implies that the last alone integer of 5* satisfies, 



s — 5 < j(w — logn)n — (j — 1) < (3n 



(4.4) 



from where can calculate j by, 



f3n - 1 



(4.5) 



(w — logn)n — 1 



In this case, the time complexity of the algorithm is 



0(n)+0(n -l) + ... + 0(n-j) = (j + l)C(n) - 0{f) < ( 



w — log n 



+ l)0(n) (4.6) 



Therefore, the time complexity of the algorithm in worst case is upper bounded by 0(n + 



Best Case If n — 1 integers satisfy S[i] — 5 < (w — log n)n, then these are sorted in 0(n) 
time. In the next step, there is ml — 1 integer left which implies sorting is finished. As a 
result, time complexity of the algorithm is lower bounded by Q(n) in the best case. 

Average Case If we assume m = (3n(w — logn) with (3 > 1, and the integers are 
uniformly distributed, this implies ^ integers satisfy S[i] — 5 < (w — log n)n. Therefore, 
the algorithm is capable of sorting ^ integers of the list in 0(n) time at first step and 
n' = n — = n ^~^ integers will be left where K of them will be sorted in the next 
step. This will continue until all the integers are sorted. The complexity in this case 
is exactly equal to the complexity that we obtained for associative sorting of read-only 
distinct integers, which means that the time complexity of the sorting algorithm is upper 
bounded by f30(n) or 0{ w ^ ogn ) for uniformly distributed lists. 

Practical Experience When modifiable version is compared with read-only version, it 
has been observed that up to — < 10 read-only version is more efficient possibly due 
to bitwise operations involved in the modifiable version. Afterwards modifiable version 
becomes more efficient than read-only version augmenting it. 



m 



W — log 71 
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5 Conclusions 



In this study, in-place associative integer sorting technique is specialized with two variants 
one for read-only integer keys and the other for modifiable integers. Both techniques are 
very simple and straightforward and around 30 lines of C code is enough. 

The read-only variant sorts the keys using 0(1) extra space in 0{n + m) time for the 
worst, 0(m) time for the average (uniformly distributed keys) and 0(n) time for the best 
case. It shows similar characteristics with bucket sort and distribution counting sort but 
it is in-place and hence time-space efficient. The ratio — defines the efficiency (time-space 
trade-offs) letting very large lists to be sorted in-place. Hence, since it does not require 
additional memory space other than a constant amount, no matter how large is the list, 
the proposed algorithm will sort faster than all provided that m = 0{n\ogn). It should 
be noted that, this variant is suitable for integer key sorting problems where satellite 
information is available along with the integer keys. 

On the other hand, the modifiable integer variant sorts the integers using 0(1) extra 

space in 0(n H ?p — ) time for the worst, 0( — ^ — ) time for the average (uniformly 

distributed keys) and 0(n) time for the best case. It should be noted that, this variant is 
not suitable for integer key sorting problems where satellite information is available along 
with the integer keys. It is suitable for integer sorting problems. It has been observed 
practically that up to — < 10 read-only version is more efficient possibly due to bitwise 
operations involved in the modifiable version. Afterwards modifiable version becomes 
more efficient than read-only version augmenting it. 
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